65 research outputs found

    Propositionalisation of multiple sequence alignments using probabilistic models

    Get PDF
    Multiple sequence alignments play a central role in Bioinformatics. Most alignment representations are designed to facilitate knowledge extraction by human experts. Additionally statistical models like Profile Hidden Markov Models are used as representations. They offer the advantage to provide sound, probabilistic scores. The basic idea we present in this paper is to use the structure of a Profile Hidden Markov Model for propositionalisation. This way we get a simple, extendable representation of multiple sequence alignments which facilitates further analysis by Machine Learning algorighms

    Sequence-based protein classification: binary Profile Hidden Markov Models and propositionalisation

    Get PDF
    Detecting similarity in biological sequences is a key element to understanding the mechanisms of life. Researchers infer potential structural, functional or evolutionary relationships from similarity. However, the concept of similarity is complex in biology. Sequences consist of different molecules with different chemical properties, have short and long distance interactions, form 3D structures and change through evolutionary processes. Amino acids are one of the key molecules of life. Most importantly, a sequence of amino acids constitutes the building block for proteins which play an essential role in cellular processes. This thesis investigates similarity amongst proteins. In this area of research there are two important and closely related classification tasks – the detection of similar proteins and the discrimination amongst them. Hidden Markov Models (HMMs) have been successfully applied to the detection task as they model sequence similarity very well. From a Machine Learning point of view these HMMs are essentially one-class classifiers trained solely on a small number of similar proteins neglecting the vast number of dissimilar ones. Our basic assumption is that integrating this neglected information will be highly beneficial to the classification task. Thus, we transform the problem representation from a one-class to a binary one. Equipped with the necessary sound understanding of Machine Learning, especially concerning problem representation and statistically significant evaluation, our work pursues and combines two different avenues on this aforementioned transformation. First, we introduce a binary HMM that discriminates significantly better than the standard one, even when only a fraction of the negative information is used. Second, we interpret the HMM as a structured graph of information. This information cannot be accessed by highly optimised standard Machine Learning classifiers as they expect a fixed length feature vector representation. Propositionalisation is a technique to transform the former representation into the latter. This thesis introduces new propositionalisation techniques. The change in representation changes the learning problem from a one-class, generative to a propositional, discriminative one. It is a common assumption that discriminative techniques are better suited for classification tasks, and our results validate this assumption. We suggest a new way to significantly improve on discriminative power and runtime by means of terminating the time-intense training of HMMs early, subsequently applying propositionalisation and classifying with a discriminative, binary learner

    Re-using DSpace to build a repository for freshwater quality data

    Get PDF
    This presentation describes how we are adapting DSpace to build LERNZdb, a repository for storing and disseminating New Zealand freshwater quality data. While the original intention in this project was to build a database for individual measurements, the repository model turned out to be a very good starting point for meeting the requirements. This allowed us to re-use our DSpace expertise gained from working on institutional publications repositories. After a quick introduction to LERNZdb's aims, data and users, the main emphasis of this presentation is on our DSpace modifications

    Statistical reporting of metabolomics data : experience from a high-throughput NMR platform and epidemiological applications

    Get PDF
    Introduction Meta-analysis is the cornerstone of robust biomedical evidence. Objectives We investigated whether statistical reporting practices facilitate metabolomics meta-analyses. Methods A literature review of 44 studies that used a comparable platform. Results Non-numeric formats were used in 31 studies. In half of the studies, less than a third of all measures were reported. Unadjusted P-values were missing from 12 studies and exact P-values from 9 studies. Conclusion Reporting practices can be improved. We recommend (i) publishing all results as numbers, (ii) reporting effect sizes of all measured metabolites and (iii) always reporting unadjusted exact P-values.Peer reviewe

    Meal timing, meal frequency, and breakfast skipping in adult individuals with type 1 diabetes - associations with glycaemic control

    Get PDF
    We assessed meal timing, meal frequency, and breakfast consumption habits of adult individuals with type 1 diabetes (n = 1007) taking part in the Finnish Diabetic Nephropathy Study, and studied whether they are associated with glycaemic control. Data on dietary intake and blood glucose measurements were retrieved from food records. HbA(1c) was measured at the study visit. In the whole sample, four peaks of energy intake emerged. Energy intake was the greatest in the evening, followed by midday. Altogether 7% of the participants reported no energy intake between 05:00 and 09:59 (breakfast skippers). While breakfast skippers reported lower number of meals, no difference was observed in the total energy intake between those eating and omitting breakfast. In a multivariable model, skipping breakfast was associated with higher mean blood glucose concentrations and lower odds of good glycaemic control. A median of 6 daily meals was reported. Adjusted for confounders, the number of meals was negatively associated with HbA(1c), and the mean of the blood glucose measurements, but positively associated with the variability of these measurements. Our observations support the habit of a regular meal pattern, including consumption of breakfast and multiple smaller meals for good glycaemic control in adults with type 1 diabetes. However, an increase in the blood glucose variability may additionally be expected with an increase in the number of meals eaten.Peer reviewe

    Resistant Hypertension and Risk of Adverse Events in Individuals With Type 1 Diabetes : A Nationwide Prospective Study

    Get PDF
    OBJECTIVE To estimate the risk of diabetic nephropathy (DN) progression, incident coronary heart disease (CHD) and stroke, and all-cause mortality associated with resistant hypertension (RH) in individuals with type 1 diabetes stratified by stages of DN, renal function, and sex. RESEARCH DESIGN AND METHODS This prospective study included a nationally representative cohort of individuals with type 1 diabetes from the Finnish Diabetic Nephropathy Study who had purchases of antihypertensive drugs at (+/- 6 months) baseline visit (1995-2008). Individuals (N= 1,103) were divided into three groups:1) RH,2) uncontrolled blood pressure (BP) but no RH, and3) controlled BP. DN progression, cardiovascular events, and deaths were identified from the individuals' health care records and national registries until 31 December 2015. RESULTS At baseline, 18.7% of the participants had RH, while 23.4% had controlled BP. After full adjustments for clinical confounders, RH was associated with increased risk of DN progression (hazard ratio 1.95 [95% CI 1.37, 2.79],P= 0.0002), while no differences were observed in those with no RH (1.05 [0.76, 1.44],P= 0.8) compared with those who had controlled BP. The risk of incident CHD, incident stroke, and all-cause mortality was higher in individuals with RH compared with those who had controlled BP but not beyond albuminuria and reduced kidney function. Notably, in those with normo- and microalbuminuria, the risk of stroke remained higher in the RH compared with the controlled BP group (3.49 [81.20, 10.15],P= 0.02). CONCLUSIONS Our findings highlight the importance of identifying and providing diagnostic and therapeutic counseling to these very-high-risk individuals with RH.Peer reviewe

    Genetic Risk Score Enhances Coronary Artery Disease Risk Prediction in Individuals With Type 1 Diabetes

    Get PDF
    OBJECTIVE Individuals with type 1 diabetes are at a high lifetime risk of coronary artery disease (CAD), calling for early interventions. This study explores the use of a genetic risk score (GRS) for CAD risk prediction, compares it to established clinical markers, and investigates its performance according to the age and pharmacological treatment. RESEARCH DESIGN AND METHODS This study in 3,295 individuals with type 1 diabetes from the Finnish Diabetic Nephropathy Study (467 incident CAD, 14.8 years follow-up) used three risk scores: a GRS, a validated clinical score, and their combined score. Hazard ratios (HR) were calculated with Cox regression, and model performances were compared with the Harrell C-index (C-index). RESULTS A HR of 6.7 for CAD was observed between the highest and the lowest 5th percentile of the GRS (P = 1.8 x 10(-6)). The performance of GRS (C-index = 0.562) was similar to HbA(1c) (C-index = 0.563, P = 0.96 for difference), HDL (C-index = 0.571, P = 0.6), and total cholesterol (C-index = 0.594, P = 0.1). The GRS was not correlated with the clinical score (r = -0.013, P = 0.5). The combined score outperformed the clinical score (C-index = 0.813 vs. C-index = 0.820, P = 0.003). The GRS performed better in individuals below the median age (38.6 years) compared with those above (C-index = 0.637 vs. C-index = 0.546). CONCLUSIONS A GRS identified individuals at high risk of CAD and worked better in younger individuals. GRS was also an independent risk factor for CAD, with a predictive power comparable to that of HbA(1c) and HDL and total cholesterol, and when incorporated into a clinical model, modestly improved the predictions. The GRS promises early risk stratification in clinical practice by enhancing the prediction of CAD.Peer reviewe

    Urinary metabolite profiling and risk of progression of diabetic nephropathy in 2670 individuals with type 1 diabetes

    Get PDF
    Aims/hypothesis This prospective, observational study examines associations between 51 urinary metabolites and risk of progression of diabetic nephropathy in individuals with type 1 diabetes by employing an automated NMR metabolomics technique suitable for large-scale urine sample collections. Methods We collected 24-h urine samples for 2670 individuals with type 1 diabetes from the Finnish Diabetic Nephropathy study and measured metabolite concentrations by NMR. Individuals were followed up for 9.0 +/- 5.0 years until their first sign of progression of diabetic nephropathy, end-stage kidney disease or study end. Cox regressions were performed on the entire study population (overall progression), on 1999 individuals with normoalbuminuria and 347 individuals with macroalbuminuria at baseline. Results Seven urinary metabolites were associated with overall progression after adjustment for baseline albuminuria and chronic kidney disease stage (p < 8 x 10(-4)): leucine (HR 1.47 [95% CI 1.30, 1.66] per 1-SD creatinine-scaled metabolite concentration), valine (1.38 [1.22, 1.56]), isoleucine (1.33 [1.18, 1.50]), pseudouridine (1.25 [1.11, 1.42]), threonine (1.27 [1.11, 1.46]) and citrate (0.84 [0.75, 0.93]). 2-Hydroxyisobutyrate was associated with overall progression (1.30 [1.16, 1.45]) and also progression from normoalbuminuria (1.56 [1.25, 1.95]). Six amino acids and pyroglutamate were associated with progression from macroalbuminuria. Conclusions/interpretation Branched-chain amino acids and other urinary metabolites were associated with the progression of diabetic nephropathy on top of baseline albuminuria and chronic kidney disease. We found differences in associations for overall progression and progression from normo- and macroalbuminuria. These novel discoveries illustrate the utility of analysing urinary metabolites in entire population cohorts.Peer reviewe

    The Relationship Between Body Fat Distribution and Nonalcoholic Fatty Liver in Adults With Type 1 Diabetes

    Get PDF
    OBJECTIVE Obesity, which is associated with nonalcoholic fatty liver (NAFL), has increased among people with type 1 diabetes. Therefore, we explored the associations between body fat distribution and NAFL in this population. RESEARCH DESIGN AND METHODS This study included 121 adults with type 1 diabetes from the Finnish Diabetic Nephropathy (FinnDiane) Study for whom NAFL was determined by magnetic resonance imaging. Body composition was assessed by dual-energy X-ray absorptiometry. Genetic data concerning PNPLA3 rs738409 and TM6SF2 rs58542926 were available as a directly genotyped polymorphism. Associations between body fat distribution, waist-to-height ratio (WHtR), BMI, and NAFL were explored using logistic regression. A receiver operating characteristic (ROC) curve was used to determine the WHtR and BMI thresholds with the highest sensitivity and specificity to detect NAFL. RESULTS Median age was 38.5 (33-43.7) years, duration of diabetes was 21.2 (17.9-28.4) years, 52.1% were women, and the prevalence of NAFL was 11.6%. After adjusting for sex, age, duration of diabetes, and PNPLA3 rs738409, the volume (P = 0.03) and percentage (P = 0.02) of visceral adipose tissue were associated with NAFL, whereas gynoid, appendicular, and total adipose tissues were not. The area under the curve between WHtR and NAFL was larger than BMI and NAFL (P = 0.04). The WHtR cutoff of 0.5 showed the highest sensitivity (86%) and specificity (55%), whereas the BMI of 26.6 kg/m(2) showed 79% sensitivity and 57% specificity. CONCLUSIONS Visceral adipose tissue is associated with NAFL in adults with type 1 diabetes, and WHtR may be considered when screening for NAFL in this population.Peer reviewe
    corecore